智能论文笔记

Knowledge mining of unstructured information: application to cyber-domain

Tuomas Takko , Kunal Bhattacharya , Martti Lehto , Pertti Jalasvirta , Aapo Cederberg , Kimmo Kaski

分类：自然语言处理 | 机器学习

2021-09-08

许多开放的在线资料中，有关网络相关犯罪，事件和冲突的信息大量提供。但是，对分析师和专家来说，处理大量数据和数据流是一项具有挑战性的任务，并且需要对较新的方法和技术的需求。在本文中，我们介绍并实施了一个新颖的知识图和知识挖掘框架，以从有关网络域中事件的自由形式文本中提取相关信息。该框架包括基于机器学习的管道，用于生成具有非技术网络主页的组织，国家，行业，产品和攻击者的图形。提取的知识图用于估计给定图配置上的网络攻击的发生率。我们使用公开可用的实际网络材料报告收集来测试我们方法的功效。发现知识提取足够准确，基于图的威胁估计证明了与攻击实际记录的一定程度。在实际使用中，利用介绍框架的分析师可以从当前的网络景观中推断出各种实体的风险以及行业和国家之间风险启发式的风险。

translated by 谷歌翻译

Self-Supervised Object Segmentation with a Cut-and-Pasting GAN

Kunal Chaturvedi , Ali Braytee , Jun Li , Mukesh Prasad

分类：计算机视觉 | 机器学习

2023-01-01

This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.

translated by 谷歌翻译

Concentration of the Langevin Algorithm's Stationary Distribution

Jason M. Altschuler , Kunal Talwar

分类： (统计)机器学习 | 机器学习

2022-12-24

A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize $\eta > 0$. This discretization leads the Langevin Algorithm to have a stationary distribution $\pi_{\eta}$ which differs from the stationary distribution $\pi$ of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of $\pi$ extend to $\pi_{\eta}$. In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for $\pi$, the analogous properties for $\pi_{\eta}$ are open questions with direct algorithmic implications. This note provides a first step in this direction by establishing concentration results for $\pi_{\eta}$ that mirror classical results for $\pi$. Specifically, we show that for any nontrivial stepsize $\eta > 0$, $\pi_{\eta}$ is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution $\pi_{\eta}$ without going through the continuous-time stationary distribution $\pi$ as an intermediary.

translated by 谷歌翻译

Cross-Domain Consumer Review Analysis

Aditya Pandey , Kunal Joshi

分类：机器学习

2022-12-23

The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.

translated by 谷歌翻译

Task Ambiguity in Humans and Language Models

Alex Tamkin , Kunal Handa , Avash Shrestha , Noah Goodman

分类：自然语言处理 | 机器学习

2022-12-20

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new benchmark of six ambiguously-specified classification tasks. We evaluate humans and models on AmbiBench by seeing how well they identify the intended task using 1) instructions with varying degrees of ambiguity, and 2) different numbers of labeled examples. We find that the combination of model scaling (to 175B parameters) and training with human feedback data enables models to approach or exceed the accuracy of human participants across tasks, but that either one alone is not sufficient. In addition, we show how to dramatically improve the accuracy of language models trained without large-scale human feedback training by finetuning on a small number of ambiguous in-context examples, providing a promising direction for teaching models to generalize well in the face of ambiguity.

translated by 谷歌翻译

Do I have the Knowledge to Answer? Investigating Answerability of Knowledge Base Questions

Mayur Patidar , Avinash Singh , Prayushi Faldu , Lovekesh Vig , Indrajit Bhattacharya , Mausam

分类：自然语言处理 | 人工智能

2022-12-20

When answering natural language questions over knowledge bases (KBs), incompleteness in the KB can naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We first identify various forms of KB incompleteness that can result in a question being unanswerable. We then propose GrailQAbility, a new benchmark dataset, which systematically modifies GrailQA (a popular KBQA dataset) to represent all these incompleteness issues. Testing two state-of-the-art KBQA models (trained on original GrailQA as well as our GrailQAbility), we find that both models struggle to detect unanswerable questions, or sometimes detect them for the wrong reasons. Consequently, both models suffer significant loss in performance, underscoring the need for further research in making KBQA systems robust to unanswerability.

translated by 谷歌翻译

AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

Aowabin Rahman , Arnab Bhattacharya , Thiagarajan Ramachandran , Sayak Mukherjee , Himanshu Sharma , Ted Fujimoto , Samrat Chatterjee

分类：机器人 | 机器学习

2022-12-20

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

translated by 谷歌翻译

Task Preferences across Languages on Community Question Answering Platforms

Sebastin Santy , Prasanta Bhattacharya , Rishabh Mehrotra

分类：自然语言处理

2022-12-18

With the steady emergence of community question answering (CQA) platforms like Quora, StackExchange, and WikiHow, users now have an unprecedented access to information on various kind of queries and tasks. Moreover, the rapid proliferation and localization of these platforms spanning geographic and linguistic boundaries offer a unique opportunity to study the task requirements and preferences of users in different socio-linguistic groups. In this study, we implement an entity-embedding model trained on a large longitudinal dataset of multi-lingual and task-oriented question-answer pairs to uncover and quantify the (i) prevalence and distribution of various online tasks across linguistic communities, and (ii) emerging and receding trends in task popularity over time in these communities. Our results show that there exists substantial variance in task preference as well as popularity trends across linguistic communities on the platform. Findings from this study will help Q&A platforms better curate and personalize content for non-English users, while also offering valuable insights to businesses looking to target non-English speaking communities online.

translated by 谷歌翻译

Elixir: A system to enhance data quality for multiple analytics on a video stream

Sibendu Paul , Kunal Rao , Giuseppe Coviello , Murugan Sankaradas , Oliver Po , Y. Charlie Hu , Srimat T. Chakradhar

分类：计算机视觉

2022-12-08

IoT sensors, especially video cameras, are ubiquitously deployed around the world to perform a variety of computer vision tasks in several verticals including retail, healthcare, safety and security, transportation, manufacturing, etc. To amortize their high deployment effort and cost, it is desirable to perform multiple video analytics tasks, which we refer to as Analytical Units (AUs), off the video feed coming out of every camera. In this paper, we first show that in a multi-AU setting, changing the camera setting has disproportionate impact on different AUs performance. In particular, the optimal setting for one AU may severely degrade the performance for another AU, and further the impact on different AUs varies as the environmental condition changes. We then present Elixir, a system to enhance the video stream quality for multiple analytics on a video stream. Elixir leverages Multi-Objective Reinforcement Learning (MORL), where the RL agent caters to the objectives from different AUs and adjusts the camera setting to simultaneously enhance the performance of all AUs. To define the multiple objectives in MORL, we develop new AU-specific quality estimator values for each individual AU. We evaluate Elixir through real-world experiments on a testbed with three cameras deployed next to each other (overlooking a large enterprise parking lot) running Elixir and two baseline approaches, respectively. Elixir correctly detects 7.1% (22,068) and 5.0% (15,731) more cars, 94% (551) and 72% (478) more faces, and 670.4% (4975) and 158.6% (3507) more persons than the default-setting and time-sharing approaches, respectively. It also detects 115 license plates, far more than the time-sharing approach (7) and the default setting (0).

translated by 谷歌翻译

Adaptive ECCM for Mitigating Smart Jammers

Kunal Pattanayak , Shashwat Jain , Vikram Krishnamurthy , Chris Berry

分类：机器学习

2022-12-05

This paper considers adaptive radar electronic counter-counter measures (ECCM) to mitigate ECM by an adversarial jammer. Our ECCM approach models the jammer-radar interaction as a Principal Agent Problem (PAP), a popular economics framework for interaction between two entities with an information imbalance. In our setup, the radar does not know the jammer's utility. Instead, the radar learns the jammer's utility adaptively over time using inverse reinforcement learning. The radar's adaptive ECCM objective is two-fold (1) maximize its utility by solving the PAP, and (2) estimate the jammer's utility by observing its response. Our adaptive ECCM scheme uses deep ideas from revealed preference in micro-economics and principal agent problem in contract theory. Our numerical results show that, over time, our adaptive ECCM both identifies and mitigates the jammer's utility.

translated by 谷歌翻译